--- title: LSTM vs. Dense neural networks for time-series author: Bruce Meng date: '2018-06-17' slug: lstm-vs-dense-neural-networks-for-time-series categories: [] tags: - R - modelling ---

This is a continuation from my last post comparing an automatic neural network from the package forecast with a manual Keras model.

I used a fully connected deep neural network in that post to model sunspots. There’s another type of model, called a long-short term memory network (LSTM), that has been widely considered to be excellent at time-series predictions. Let’s run through a comparison between a deep feed-forward neural network model established in the prior post with a LSTM type of model.

Dataset

We’ll reuse the sunspots dataset since it’s one of the better ones (it’s long and exhibits nice seasonal patterns).

And this is the testing data which we will test our models against:

Dense layers

First up is the dense network.

Step 1 - we will need to manually prepare the dataset into a format that Keras can understand. The code is a bunch of scaling, centering and turning the data from a tibble/data.frame to a matrix. I will skip showing that section as I suspect you’ll find it boring and it takes up quite a bit of room.

Step 2 - we can now construct a Keras model:

# Model params
units <- 256
inputs <- 1

# Create model
model.dense <- keras_model_sequential()

model.dense %>%
        layer_dense(units = units,
                    input_shape = c(lookback),
                    batch_size = inputs,
                    activation = "relu") %>%
                layer_dense(units = units/2,
                    activation = "relu") %>%
                layer_dense(units = units/8,
                    activation = "relu") %>%
        layer_dense(units = 1)

# Compile model
model.dense %>% compile(optimizer = "rmsprop",
                  loss = "mean_squared_error",
                  metrics = "accuracy")

Step 3 - we can now attempt to train the model:

## Model
## ___________________________________________________________________________
## Layer (type)                     Output Shape                  Param #     
## ===========================================================================
## dense_1 (Dense)                  (1, 256)                      37120       
## ___________________________________________________________________________
## dense_2 (Dense)                  (1, 128)                      32896       
## ___________________________________________________________________________
## dense_3 (Dense)                  (1, 32)                       4128        
## ___________________________________________________________________________
## dense_4 (Dense)                  (1, 1)                        33          
## ===========================================================================
## Total params: 74,177
## Trainable params: 74,177
## Non-trainable params: 0
## ___________________________________________________________________________

Step 4 - we can now make predictions from the model:

## Predict based on last observed sunspot number
n <- 299 #number of predictions to make

predictions <- numeric() #vector to hold predictions

# Generate predictions, starting with last observed sunspot number and feeding
# new predictions back into itself
for(i in 1:n){
    pred.y <- x[(nrow(x) - inputs + 1):nrow(x), 1:lookback]
    dim(pred.y) <- c(inputs, lookback)
    
    # forecast
    fcst.y <- model.dense %>% predict(pred.y, batch_size = inputs)
    fcst.y <- as_tibble(fcst.y)
    names(fcst.y) <- "x"
    
    # Add to previous dataset data.tibble.rec
    data.tibble.rec.dense <- rbind(data.tibble.rec.dense, fcst.y)
    
    ## Recalc lag matrix
    # Setup a lagged matrix (using helper function from nnfor)
    data.tibble.rec.lag <- nnfor::lagmatrix(data.tibble.rec.dense$x, 0:lookback)
    colnames(data.tibble.rec.lag) <- paste0("x-", 0:lookback)
    data.tibble.rec.lag <- as_tibble(data.tibble.rec.lag) %>%
    filter(!is.na(.[, ncol(.)])) %>%
    as.matrix()
    
    # x is input (lag), y is output, multiple inputs
    x <- data.tibble.rec.lag[, 2:(lookback + 1)]
    dim(x) <- c(nrow(x), ncol(x))
    
    y <- data.tibble.rec.lag[, 1]
    dim(y) <- length(y)
    
    # Invert recipes
    fcst.y <- fcst.y * (range.max.step - range.min.step) + range.min.step
    
    # save prediction
    predictions[i] <- fcst.y %>% 
            InvBoxCox(l)
    predictions <- unlist(predictions)
}

LSTM

Step 1 - we need to slightly tweak the data for LSTM models since LSTM expects a 3D tensor, instead of the 2D tensor used in the prior model.

Step 2 - we can now construct a Keras model:

# Model params
units <- 1
inputs <- 1

# Create model
model.lstm <- keras_model_sequential()

model.lstm %>%
        layer_cudnn_lstm(units = units,
                         input_shape = c(lookback, 1),
                         batch_size = inputs,
                         stateful = T,
                         return_sequences = F
                         ) %>%
        layer_dense(units = 1)

# Compile model
model.lstm %>% compile(optimizer = "rmsprop",
                  loss = "mean_squared_error",
                  metrics = "accuracy")

Step 3 - we can now attempt to train the model:

## Model
## ___________________________________________________________________________
## Layer (type)                     Output Shape                  Param #     
## ===========================================================================
## cu_dnnlstm_1 (CuDNNLSTM)         (1, 1)                        16          
## ___________________________________________________________________________
## dense_5 (Dense)                  (1, 1)                        2           
## ===========================================================================
## Total params: 18
## Trainable params: 18
## Non-trainable params: 0
## ___________________________________________________________________________

Step 4 - we can now make predictions from the model:

## Predict based on last observed sunspot number
predictions.lstm <- numeric() #vector to hold predictions.lstm

# Generate predictions.lstm, starting with last observed sunspot number and feeding
# new predictions.lstm back into itself
for(i in 1:n){
    pred.y.lstm <- x.lstm[(nrow(x.lstm) - inputs + 1):nrow(x.lstm), 1:lookback, 1]
    dim(pred.y.lstm) <- c(inputs, lookback, 1)
    
    # forecast
    fcst.y.lstm <- model.lstm %>% predict(pred.y.lstm, batch_size = inputs)
    fcst.y.lstm <- as_tibble(fcst.y.lstm)
    names(fcst.y.lstm) <- "x"
    
    # Add to previous dataset data.tibble.rec
    data.tibble.rec.lstm <- rbind(data.tibble.rec.lstm, fcst.y.lstm)
    
    ## Recalc lag matrix.lstm
    # Setup a lagged matrix.lstm (using helper function from nnfor)
    data.tibble.rec.lag <- nnfor::lagmatrix(data.tibble.rec.lstm$x, 0:lookback)
    colnames(data.tibble.rec.lag) <- paste0("x-", 0:lookback)
    data.tibble.rec.lag <- as_tibble(data.tibble.rec.lag) %>%
    filter(!is.na(.[, ncol(.)])) %>%
    as.matrix()
    
    # x.lstm is input (lag), y.lstm is output, multiple inputs
    x.lstm <- data.tibble.rec.lag[, 2:(lookback + 1)]
    dim(x.lstm) <- c(nrow(x.lstm), ncol(x.lstm), 1)
    
    y.lstm <- data.tibble.rec.lag[, 1]
    dim(y.lstm) <- length(y.lstm)
    
    # Invert recipes
    fcst.y.lstm <- fcst.y.lstm * (range.max.step - range.min.step) + range.min.step
    
    # save prediction
    predictions.lstm[i] <- fcst.y.lstm %>% 
            InvBoxCox(l)
    predictions.lstm <- unlist(predictions.lstm)
}

Results!

Ok let’s see some results! (Since we have 2 models, I’m also going to sneak in an ensemble model which is simply an average of the dense and LSTM model predictions).


The results are fairly good for all three models. Dense looks to be superior for the first 2/3 of the way, while LSTM appears to be better in the last 1/3. Let’s see some scores (RMSE) to settle this more definitively:

RMSE Dense RMSE LSTM RMSE Ensemble
33.35586 35.54449 26.25595

The ensemble model takes it, with the really good dense layer model coming in second. The LSTM was just a tad worse, and maybe with some more tuning of the hyperparamters it may become better.